Skip to main content

8.1 - Concept - A More Generalized Agent

The agents developed in Part 5, while effective at their specific tasks, are fundamentally limited by their fixed-size observation and action spaces. They are brittle; the MicroBot has no capacity to handle a second Zergling, and the MacroBot has no concept of building a Barracks.

To progress toward an agent capable of playing a full game of StarCraft II, we must adopt an architecture that can handle a variable number of entities and actions. This section introduces the concepts behind a more generalized agent, inspired by advanced architectures like DRL-RM.

The Core Architectural Shift

Our previous approach is insufficient for generalized gameplay. The new approach must be able to process a dynamic list of units and decompose complex decisions.

AspectFixed Vector Architecture (Previous Bots)Entity-Based Architecture (Generalized Bot)
ProblemThe observation vector has a fixed size and meaning (e.g., [marine_hp, zergling_hp]). This breaks if the number or type of units changes.How can the agent perceive and act upon a dynamic number of different units?
Solution"Flattened Features." A single vector representing a highly specific, pre-defined scenario."List of Entities." The observation is a list of feature vectors, where each vector represents a single unit. This list can grow or shrink.
LimitationCannot scale beyond its pre-defined scenario.Requires a more complex policy network to process variable-length, unordered data.

The Decomposed Action Space

Instead of having a single action space with one choice (e.g., "Kite"), a generalized agent must make a sequence of related sub-decisions. This is often called a decomposed or auto-regressive action space. The agent must learn a policy to answer a series of questions:

The "Who -> What -> Whom" Decision Process:

+--------------------------+     +--------------------------+     +--------------------------+
| Step 1: | | Step 2: | | Step 3: |
| SELECT ACTOR |---->| SELECT ABILITY |---->| SELECT TARGET |
+--------------------------+ +--------------------------+ +--------------------------+
| (Which of my units | | (What action should | | (Which enemy unit or point|
| should act?) | | the selected unit take?)| | should be the target?) |
+--------------------------+ +--------------------------+ +--------------------------+
Output: index of a friendly unit Output: index of an ability (Move, Attack) Output: index of a target unit

The final command is an assembly of these three sub-decisions: "Unit i uses ability j on target k."

The Role of a Relational Module

A key component of advanced architectures like DRL-RM is a relational module (often an attention mechanism or graph neural network).

  • Purpose: To process the unordered list of unit features and learn the critical relationships between them.
  • Function: It allows the agent to answer questions like, "Which of my units is currently being attacked?" or "Which enemy unit is the most isolated and vulnerable?" This is essential context for making intelligent decisions in the "Who -> What -> Whom" process.

Our Implementation Goal

Building a full, research-grade DRL-RM agent with an attention module is a complex Ph.D.-level project. Our goal in this section is more modest and practical:

  • To implement a simplified version of this architecture.
  • To create an environment where the observation is a list of unit feature vectors.
  • To create a decomposed action space that selects an actor, ability, and target.
  • To use a standard Multi-Layer Perceptron (MLP) as a stand-in for a more complex relational module.

This will provide a foundational, working example of how to structure a more generalized agent, paving the way for more advanced implementations.